Try nuking ShardLayout::V0 #12313

eagr · 2024-10-25T09:47:06Z

No description provided.

eagr · 2024-10-25T09:54:30Z

can you guys do something like cargo test -p near-chain-configs without dependency issues? @wacban

wacban · 2024-10-25T10:28:16Z

can you guys do something like cargo test -p near-chain-configs without dependency issues? @wacban

It fails for me actually, that's not great. I typically run it on the whole workspace and just filter to the tests that I want. Also we use nextest framework, rather than test, though I have no clue as to why. It's suboptimal but I never bothered to optimize this part of my work flow.

cargo nextest run <test>

wacban · 2024-10-25T10:29:05Z

If you feel like fixing it, go for it. It looks like it's only a matter of adding some dependencies to the cargo file.

wacban · 2024-10-25T10:29:57Z

JFYI this PR is marked as draft, please make it as ready for review when it is.

eagr · 2024-10-26T09:07:05Z

If you feel like fixing it, go for it. It looks like it's only a matter of adding some dependencies to the cargo file.

It seems like this is expected behavior. If it's not bothering anyone else, not sure if it needs fixing. And it could be easily mitigated by adding an --all-features flag to the command.

core/primitives/src/shard_layout.rs

eagr · 2024-10-28T07:05:27Z

core/primitives/src/shard_layout.rs

+    }
+
+    /// Construct a layout with given number of shards
+    pub fn of_num_shards(num_shards: NumShards, version: ShardVersion) -> Self {


got a better idea for the fn name?

Maybe multi_shard, just to mach the single_shard one?

That was my first thought but it could also be used to create a single-shard layout, so I changed my mind. But if you like that name I'm also down with it. :)

eagr · 2024-10-28T07:13:04Z

nearcore/src/config.rs

@@ -1087,7 +1075,7 @@ pub fn create_localnet_configs_from_seeds(
        .map(|seed| InMemorySigner::from_seed("node".parse().unwrap(), KeyType::ED25519, seed))
        .collect::<Vec<_>>();

-    let shard_layout = ShardLayout::v0(num_shards, 0);
+    let shard_layout = ShardLayout::of_num_shards(num_shards, 0);


This would cause some sanity check to fail as you could see from the CI logs. It seems like some json parsing issue. Not sure whether if you'd like to keep it as it was or to update the config somewhere else to make it work.

Let's try updating the config and if it doesn't work leave as is.

nearcore/src/config.rs

eagr · 2024-10-28T07:17:51Z

integration-tests/src/tests/client/features/stateless_validation.rs

-    let error_message = format!("{}", error).to_lowercase();
-    tracing::info!(target: "test", "error message: {}", error_message);
-    assert!(error_message.contains("shard"));
+    let _res = env.clients[0].process_chunk_state_witness(witness, witness_size, None, signer);


There's a panic from get_shard_index() after switching to V2.

Ah that's pretty bad. Feel free to either:

Fix it (may be complicated / lots of code if you need to add error handling)

Leave as is but put a TODO(wacban) in there instead of FIXME and I will have a look.

Make the default shard layout V1 (hopefully this works?)

I'll try 3 (should probably work from the look of the code) which seems like a nice middle ground before finishing transition to V2

wacban

looks nice, answered some questions

wacban · 2024-10-28T09:32:55Z

core/chain-configs/src/genesis_config.rs

+    // FIXME eagr what should be the default?
+    #[default(ShardLayout::v0(1, 0))]


Ideally it should use the single_shard method that returns the most recent (today it's V2) shard layout.

nit: The convention here seems to be to use the default_ function to provide the default value.

core/primitives/src/shard_layout.rs

wacban · 2024-10-28T09:34:36Z

core/primitives/src/shard_layout.rs

+    }
+
+    /// Construct a layout with given number of shards
+    pub fn of_num_shards(num_shards: NumShards, version: ShardVersion) -> Self {


Maybe multi_shard, just to mach the single_shard one?

core/primitives/src/shard_layout.rs

wacban · 2024-10-28T09:39:29Z

integration-tests/src/tests/client/features/stateless_validation.rs

-    let error_message = format!("{}", error).to_lowercase();
-    tracing::info!(target: "test", "error message: {}", error_message);
-    assert!(error_message.contains("shard"));
+    let _res = env.clients[0].process_chunk_state_witness(witness, witness_size, None, signer);


Ah that's pretty bad. Feel free to either:

Fix it (may be complicated / lots of code if you need to add error handling)

Leave as is but put a TODO(wacban) in there instead of FIXME and I will have a look.

Make the default shard layout V1 (hopefully this works?)

nearcore/src/config.rs

wacban · 2024-10-28T09:44:16Z

nearcore/src/config.rs

@@ -1087,7 +1075,7 @@ pub fn create_localnet_configs_from_seeds(
        .map(|seed| InMemorySigner::from_seed("node".parse().unwrap(), KeyType::ED25519, seed))
        .collect::<Vec<_>>();

-    let shard_layout = ShardLayout::v0(num_shards, 0);
+    let shard_layout = ShardLayout::of_num_shards(num_shards, 0);


Let's try updating the config and if it doesn't work leave as is.

wacban · 2024-10-28T09:45:56Z

tools/database/src/corrupt.rs

@@ -21,7 +21,6 @@ impl CorruptStateSnapshotCommand {
        let mut store_update = store.store_update();
        // TODO(resharding) automatically detect the shard version
        let shard_layout = match self.shard_layout_version {
-            0 => ShardLayout::v0(1, 0),


Can you keep this one?

wacban

looks good,

I think serde doesn't like (de)serializing maps with non-string keys, like the ones in V2 and it breaks the tests. Feel free to fallback to V1 is it's too crazy to fix in this PR.

wacban · 2024-10-29T09:08:02Z

core/primitives/src/shard_layout.rs

-    #[test]
-    fn test_shard_layout_v0() {
-        let num_shards = 4;
-        let shard_layout = ShardLayout::v0(num_shards, 0);
-        let mut shard_id_distribution: HashMap<ShardId, _> =
-            shard_layout.shard_ids().map(|shard_id| (shard_id.into(), 0)).collect();
-        let mut rng = StdRng::from_seed([0; 32]);
-        for _i in 0..1000 {
-            let s: Vec<u8> = (&mut rng).sample_iter(&Alphanumeric).take(10).collect();
-            let s = String::from_utf8(s).unwrap();
-            let account_id = s.to_lowercase().parse().unwrap();
-            let shard_id = account_id_to_shard_id(&account_id, &shard_layout);
-            assert!(shard_id < num_shards);
-            *shard_id_distribution.get_mut(&shard_id).unwrap() += 1;
-        }
-        let expected_distribution: HashMap<ShardId, _> = [
-            (ShardId::new(0), 247),
-            (ShardId::new(1), 268),
-            (ShardId::new(2), 233),
-            (ShardId::new(3), 252),
-        ]
-        .into_iter()
-        .collect();
-        assert_eq!(shard_id_distribution, expected_distribution);
-    }


Please keep this one, the V0 may still be used when replaying some very old blocks.

eagr · 2024-10-29T12:54:41Z

I think serde doesn't like (de)serializing maps with non-string keys, like the ones in V2 and it breaks the tests. Feel free to fallback to V1 is it's too crazy to fix in this PR.

Then I guess it needs a custom de/serializer that converts the keys to strings and back. I'll give it a shot if it's not too complicated.

core/primitives/src/shard_layout.rs

wacban · 2024-10-30T13:10:22Z

JFYI I had a look at the test failure in CI. It seems like something somewhere has the shard layout version hard coded to 0 where in your PR you (correctly) use the provided version. It's a bit wild, I'll keep digging.

wacban · 2024-10-30T13:39:02Z

nearcore/src/config.rs

-            } else {
-                ShardLayout::v0_single_shard()
-            };
+            let shards = ShardLayout::multi_shard(num_shards, 3);


To fix the runtime-params-estimator test you can to set the version here to 0. It's suboptimal and definitely buggy but I don't think it's worth properly debugging this rather old test framework.

Oh my it breaks a bunch of other tests. I guess it will be easier to fix it here after all.

To fix the runtime-params-estimator test you can to set the version here to 0.

done

codecov · 2024-10-30T18:31:40Z

Codecov Report

Attention: Patch coverage is 85.03937% with 19 lines in your changes missing coverage. Please review.

Project coverage is 71.15%. Comparing base (8e30ccd) to head (1c502af).

Files with missing lines	Patch %	Lines
core/primitives/src/shard_layout.rs	82.52%	6 Missing and 12 partials ⚠️
chain/chain/src/test_utils.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #12313      +/-   ##
==========================================
- Coverage   71.19%   71.15%   -0.05%     
==========================================
  Files         839      839              
  Lines      169743   169831      +88     
  Branches   169743   169831      +88     
==========================================
- Hits       120851   120844       -7     
- Misses      43633    43717      +84     
- Partials     5259     5270      +11

Flag	Coverage Δ
backward-compatibility	`0.16% <0.00%> (-0.01%)`	⬇️
db-migration	`0.16% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.27% <48.67%> (+0.04%)`	⬆️
integration-tests	`38.99% <49.60%> (-0.01%)`	⬇️
linux	`70.58% <85.03%> (-0.07%)`	⬇️
linux-nightly	`70.73% <85.03%> (-0.05%)`	⬇️
macos	`50.40% <83.46%> (-0.03%)`	⬇️
pytests	`1.57% <49.55%> (+0.03%)`	⬆️
sanity-checks	`1.38% <48.67%> (+0.04%)`	⬆️
unittests	`64.13% <83.46%> (-0.02%)`	⬇️
upgradability	`0.21% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eagr · 2024-10-30T19:08:44Z

The failing test could be fixed by adding some feature flags. But that's from master, does it need to be fixed here?

wacban

Looks good to me, thank you for this contribution! Just a few final nits. Most are optional, I only really care about restoring the assertion in the resharding test.

wacban · 2024-10-31T08:37:27Z

core/primitives/src/shard_layout.rs

+            shards_split_map: None,
+            shards_parent_map: None,
+            version,
+        })
    }

    /// Return a V0 Shardlayout


Can you mark it as deprecated? I don't know how to do this properly in rust, if it's not straight forward then a comment should do.

How about marking ShardLayout::V0 as deprecated? This way any usage of V0 would raise a deprecation warning including calling v0().

How about both? :)

wacban · 2024-10-31T08:37:59Z

chain/chain/src/resharding/event_type.rs

-        // Shard layouts V0 and V1 are rejected.
-        assert!(ReshardingEventType::from_shard_layout(
-            &ShardLayout::v0_single_shard(),
-            block,
-            prev_block
-        )
-        .is_err());


Can we keep this?

wacban · 2024-10-31T08:38:52Z

chain/epoch-manager/src/tests/mod.rs

@@ -2294,7 +2294,7 @@ fn test_protocol_version_switch_with_shard_layout_change() {
        epoch_manager.get_epoch_info(&epochs[1]).unwrap().protocol_version(),
        new_protocol_version - 1
    );
-    assert_eq!(epoch_manager.get_shard_layout(&epochs[1]).unwrap(), ShardLayout::v0_single_shard(),);
+    assert_eq!(epoch_manager.get_shard_layout(&epochs[1]).unwrap(), ShardLayout::single_shard(),);


mini nit: remove the trailing comma

wacban · 2024-10-31T08:42:12Z

core/primitives/src/shard_layout.rs

+        let id_to_index_map =
+            layout.id_to_index_map.iter().map(|(k, v)| (k.to_string(), *v)).collect();


This is completely fine but I would be tempted to write a generic function that converts a Map<ShardId, T> to Map<String, T> and use it for all the maps in the shard layout. We're looking at whooping potential savings of ~2 lines of code so up to you if you think it's worth it :)

wacban · 2024-10-31T08:46:16Z

core/primitives/src/shard_layout.rs

+        let id_to_index_map = layout
+            .id_to_index_map
+            .into_iter()
+            .map(|(k, v)| Ok((k.parse::<u64>()?.into(), v)))
+            .collect::<Result<_, Self::Error>>()?;


Ditto about generic function for this but here it may actually make some sense because it's less trivial logic.

wacban · 2024-10-31T08:47:37Z

core/primitives/src/shard_layout.rs

+impl TryFrom<SerdeShardLayoutV2> for ShardLayoutV2 {
+    type Error = Box<dyn std::error::Error + Send + Sync>;
+
+    fn try_from(layout: SerdeShardLayoutV2) -> Result<Self, Self::Error> {


mini nit: May unpack the layout as first step and then use the unpacked values directly? It may be a bit prettier and it would be more obvious that there isn't unnecessary cloning.

core/primitives/src/shard_layout.rs

wacban · 2024-10-31T08:49:01Z

core/primitives/src/shard_layout.rs

+    }
+}
+
+impl<'de> serde::Deserialize<'de> for ShardLayoutV2 {


nit: I think the convention is to call the lifespan 'a.

eagr · 2024-10-31T11:58:24Z

core/primitives/src/shard_layout.rs

+            shards_split_map: None,
+            shards_parent_map: None,
+            version,
+        })
    }

    /// Return a V0 Shardlayout


How about marking ShardLayout::V0 as deprecated? This way any usage of V0 would raise a deprecation warning including calling v0().

eagr · 2024-10-31T12:01:21Z

core/primitives/src/shard_layout.rs

+    }
+
+    /// Can be used to construct a multi-shard layout, mostly for test purposes
+    pub fn multi_shard(num_shards: NumShards, version: ShardVersion) -> Self {


how about n_shard() in the sense of creating an N-shard layout?

I'm not a fan tbh. How about just new or new_test?

wacban · 2024-10-31T12:05:51Z

I tried the pre-merge tests and unfortunately some are failing. Those are the most expensive tests that only run before merging to master. I tried a simple debug but I couldn't fix it easily. I'm afraid we may need to restore kv_runtime to use v0 for now to make it pass. I'm testing this change again on a fork from your PR:
38ca3a1
test run reference for myself:
https://nayduck.nearone.org/#/run/564

eagr requested a review from a team as a code owner October 25, 2024 09:47

eagr requested a review from Longarithm October 25, 2024 09:47

eagr marked this pull request as draft October 25, 2024 09:50

eagr force-pushed the deprec-shard-v0 branch from 44c2b52 to 264738b Compare October 25, 2024 09:59

eagr force-pushed the deprec-shard-v0 branch from 0d367db to e8a78cc Compare October 27, 2024 04:58

v0_single_shard() -> single_shard()

d709cf9

eagr force-pushed the deprec-shard-v0 branch 3 times, most recently from 02b02f4 to 7f64b44 Compare October 28, 2024 04:50

try removing v0()

7ba337a

eagr force-pushed the deprec-shard-v0 branch from 7f64b44 to 7ba337a Compare October 28, 2024 04:58

create_localnet_configs_from_seeds()

4133145

eagr marked this pull request as ready for review October 28, 2024 06:18

init_configs()

4067758

eagr force-pushed the deprec-shard-v0 branch from e93bd9a to 4067758 Compare October 28, 2024 06:24

eagr commented Oct 28, 2024

View reviewed changes

core/primitives/src/shard_layout.rs Show resolved Hide resolved

eagr commented Oct 28, 2024

View reviewed changes

nearcore/src/config.rs Outdated Show resolved Hide resolved

eagr commented Oct 28, 2024

View reviewed changes

wacban reviewed Oct 28, 2024

View reviewed changes

eagr added 3 commits October 29, 2024 04:31

todo for failing tests

5b7d17e

bring back V0 for database tools

af3296a

update genesis_config.json

2a2710d

wacban reviewed Oct 29, 2024

View reviewed changes

fix V2 serialization

28b24b0

wacban reviewed Oct 29, 2024

View reviewed changes

core/primitives/src/shard_layout.rs Show resolved Hide resolved

eagr added 4 commits October 29, 2024 23:55

of_num_shards -> multi_shard

753019f

bring back test_shard_layout_v0()

af9a154

Merge branch 'master' into deprec-shard-v0

76f0c78

glitch

99bb9ca

wacban reviewed Oct 30, 2024

View reviewed changes

eagr added 4 commits October 31, 2024 01:26

SerdeShardLayoutV2

67522cf

try init layout of version 0

52f1812

update genesis config

0bf8d43

Merge branch 'master' into deprec-shard-v0

1c502af

wacban approved these changes Oct 31, 2024

View reviewed changes

eagr commented Oct 31, 2024

View reviewed changes

		// FIXME eagr what should be the default?
		#[default(ShardLayout::v0(1, 0))]

		let id_to_index_map =
		layout.id_to_index_map.iter().map(\|(k, v)\| (k.to_string(), *v)).collect();

Try nuking ShardLayout::V0 #12313

Are you sure you want to change the base?

Try nuking ShardLayout::V0 #12313

Conversation

eagr commented Oct 25, 2024

eagr commented Oct 25, 2024

wacban commented Oct 25, 2024

wacban commented Oct 25, 2024

wacban commented Oct 25, 2024

eagr commented Oct 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eagr commented Oct 29, 2024

wacban commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 30, 2024

Codecov Report

eagr commented Oct 30, 2024

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wacban commented Oct 31, 2024 • edited Loading

wacban commented Oct 31, 2024 •

edited

Loading